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Nowadays, with the evolution of digital video broadcasting, as well as, the 
advent of high speed broadband networks, a new era of TV services has 
emerged known as IPTV. IPTV is a system that employs the high speed 
broadband networks to deliver TV services to the subscribers. From the 
service provider viewpoint, the challenge in IPTV systems is how to build 
delivery networks that exploits the resources efficiently and reduces the 
service cost, as well. However, designing such delivery networks affected by 
many factors including choosing the suitable network architecture, load 
balancing, resources waste, and cost reduction. Furthermore, IPTV contents 


characteristics, particularly; size, popularity, and interactivity play an 
important role in balancing the load and avoiding the resources waste for 
delivery networks. In this paper, we investigate the problem of resource 
allocation for IPTV delivery networks over the recent architecture, peer- 
service area architecture. The Genetic Algorithm as an optimization tool has 
been used to find the optimal provisioning parameters including storage, 
bandwidth, and CPU consumption. The experiments have been conducted on 
two data sets with different popularity distributions. The experiments have 
been conducted on two popularity distributions. The experimental results 
showed the impact of content status on the resource allocation process. 
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1. INTRODUCTION 

IPTV, as a promising technology, is growing rapidly in terms of subscribers and revenue to become 
in the near future the standard means to deliver home and business entertainment contents [1]. Thus, 
Telecommunication companies have entered a hectic competition to increase their customer base and profit 
by delivering the IPTV services [1], [2]. On the other hand, the key concern of service providers in this hectic 
competition is how to provide high quality service with a minimum cost. 

Consequently, the cost reduction is one of the service providers’ worries. Such cost reduction can be 
achieved by controlling the resources allocation process [3]. Resource Allocation is an important issue in 
designing the shared delivery network. Resource allocation determines the locations and the amount of 
resources to minimize the cost under certain constraints. In the shared delivery networks, many service 
providers may share resources and compete to deliver good quality services. For that, these resources should 
be shared in a more effective manner [4]. However, IPTV systems suffer from the sudden peak workloads. 
Thus, allocating a large amount of resources to cope with the sudden workload leads to low resources 
utilization at non-peak hours. On the other hand, non-considering the peak workload leads to user 
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dissatisfaction during the peak hours. For that, pay-as-you-go scheme is suggested by some delivery network 
providers. Amazon CloudFront is a global example of this scheme. In this scheme, the service provider has to 
pay for the used resources only without any upfront, commit, or service contract. 

Particularly, the fluctuation in the workload (i.e. the number of incoming requests targeting the 
hosted contents) is a serious issue threatens IPTV systems. Such workload fluctuation issue figures out the 
content status in IPTV system. Gaber and others [5], [6], [7] suggested a contents’ workload estimation 
model that estimates the contents’ status according to their characteristics. Of these characteristics are huge 
size (i.e. in Mega Bytes or Giga Bytes), interactivity, and rapidly changing effective period [8]. Such distinct 
characteristics make the design of IPTV delivery networks challengeable. On the other hand, ignoring the 
status of diverse contents leads to load imbalance and waste of resources [9]. Therefore, estimating the 
content’s status according to these characteristics is a crucial factor in IPTV systems. Content status 
modeling refers as estimating the portion of concurrent requests that target the content according to its 
characteristics. Such modeling helps a lot in handling both the load balancing and resource allocation based 
on the anticipated load of contents [5]. 

By means of such contents’ status estimation model, the service provider can control the resources 
allocation according to the content status and the users’ demand to reduce the hosting cost [10] without 
affecting the users’ satisfaction. This integration allows the service provider to adjust the required resources 
according to the necessitated replication scheme. For that, Laoutaris et al. [11] argue that replica placement 
problem and resource allocation problem should be solved jointly due to the dependency between them. 
Moreover, considering the contents’ status during the integration gives the service providers the possibility to 
control the resources and content distribution according to the demand fluctuation, as well as, enables the 
delivery network to absorb the sudden load fluctuation. 

Therefore, this paper concerns about how to build content-aware resource allocation model that 
considers the content status to reduce the waste of resources. In this paper, Content-Aware Resource 
Allocation Model (CARAM) is proposed. CARAM model considers the contents’ status including contents’ 
load and contents’ popularity distribution to decide the required resources. CARAM model allocates the 
resources according to the replication strategy. Thus, it can save the resources in terms of storage, bandwidth, 
and CPU. For this purpose, the resource allocation problem and the replica placement problem are integrated 
to allocate the resources depending on the necessitated replication scheme. The content-aware resource 
allocation problem is studied and formulated mathematically as a constrained binary optimization problem. 
Next, this problem is solved using Hybrid Genetic Algorithm aiming to evaluate the effect of content status 
on the content-aware resource allocation. For this purpose, two popularity distributions are considered 
through this paper. These popularity distributions are Zipf’s distribution and the exponential distributions, 
which follow the normal density function [12]. 

The rest of this paper is organized as follows. Section 2 is dedicated for background and related 
works. Section 3 presents the system model and problem formulation. Section 4 introduces the proposed 
hybrid Genetic algorithm to solve this problem. Section 5 is dedicated for experimental results and 
discussion. Finally, Section 6 presents the conclusion and future work. 


2. BACKGROUND AND RELATED WORKS 

Although the high benefits of IPTV delivery networks to both service provider and Content 
consumer, designing efficient delivery networks is not a trivial task and has potential problems, which must 
be solved. IPTV delivery networks composition, Content distribution and management, resource allocation, 
and request redirection are the main classes of issues that must be considered during building the delivery 
networks. IPTV delivery networks composition includes delivery network placement, determining the 
servers’ specifications, and interaction among servers. Content distribution and management includes content 
selection and distribution based on demand and cache/replica management. Resource allocation chooses the 
type and size of resources that are required for each surrogate server. Request redirection includes the 
techniques to distribute the requests among servers in a manner to maintain the load balanced and avoid the 
traffic congestion [4], [13]. 

Content placement is an important point in the delivery networks for multimedia streaming. In large 
multimedia systems (e.g. IPTV), the effective content placement is highly required to cope with load 
imbalance, insufficient bandwidth, large delay, and packet loss [14]-[16]. In the context of content 
placement, a considerable number of works are proposed to cope with the growing demand on these contents. 
On the other hand, Replica placement Problem, as a global strategy, is widely investigated in the literature 
review to improve the overall performance and/or minimize the allocation cost [17]. In the literature, the 
Replica Placement problem is formally expressed as a problem definition (cost function), which moves 
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toward an overall goal (performance or cost improvement). Such problems typically require heuristic 
algorithms to find approximate solutions within a feasible time, due to that they are NP-complete [18],[ 19]. 

In the context of problem definition, the researchers investigate the content replication problem from 
two perspectives. The first perspective is to improve the quality of service [20]-[23]. The second perspective 
is to minimize the delivery cost [11], [12], and [24]-[28]. These works can also be classified according to the 
solution proposed. These solutions can be classified into heuristic solutions and evolutionary solutions. 
The heuristic solutions include the approximation algorithms and the greedy algorithms. In contrast, the 
evolutionary algorithms include Genetic, ant colony, and particle swarm algorithms. 

Because of the Replica Placement Problem is NP-Complete [29]-[31], they typically require meta- 
heuristic algorithms to find approximate solutions within a feasible time [18]. Employing such meta-heuristic 
algorithms help in gaining the maximum benefit of replication that is crucial for the success of the IPTV 
systems. However, the aforementioned works, which exploit heuristic algorithms, fail to produce an optimal 
effective placement strategy [30], [31]. Thus, due to the fruitful application of evolutionary algorithms to 
solve a variety of optimization problems, many attempts are recorded in the direction of using these 
algorithms in solving replica placement problem. Khan and Ahmad [32] compared the heuristic algorithms 
with evolutionary algorithms (Genetic Algorithm as an example). They introduced a unified cost model that 
captures the minimization of the total object transfer cost in the system, which in turn, leads to effective 
utilization of storage space, replica consistency, and fault-tolerance. The study shows that Genetic Algorithm 
outperforms the heuristic algorithms (e.g. greedy algorithms) in terms of solution quality. Although, 
Evolutionary algorithms produce better solutions as close as possible to the optimal solutions, the heuristic 
algorithms are faster than evolutionary algorithms in the running time. These findings also confirmed by the 
authors in [31], [33]. 

Resource allocation is an important process in designing IPTV delivery networks. This process 
consists of determining the location, the type, and the amount of networking resources to deploy. The aim of 
the resource allocation problem is to minimize the cost while certain constraints are respected [19]. Thus, the 
optimization strategies are a good option to solve such problems [34]. In the literature review, the resource 
allocation problem has been addressed using heuristic solutions [35]. Wauters et al. [21] address the resource 
allocation problem of determining the equipment required for building transport network for VoD services. 
They focused on determining the number of ports of each server, as well as, the number of multiplexers and 
switch ports at each node. In their study, the decentralized network is divided into regional sub networks with 
a ring topology. Next, a single server is installed on each regional sub network. Similarly, Kitjongthawonkul 
and Ko [36] examined the resource allocation problems in a VOD network to find the optimal locations of 
the video servers. They considered the users’ demand to select the location of those servers. Moreover, the 
authors addressed the allocation problem using the dynamic programming technique. The objective of 
Kitjongthawonkul and Ko is to minimize the total operating cost, including transmission cost, storage cost, 
and server installation cost. 

Allocating the resources to build an overlay network over multiple substrate networks is proposed 
by Houidi et al. [37] and Pandey et al. [38]. Houidi proposed a dynamic overlay network creation model 
according to the load and the available resources. The proposed model composed of request splitting 
algorithm and embedding algorithm. The request splitting algorithm aims at helping the overlay network 
provider to distribute the incoming requests among the available substrate network providers. The authors 
proposed two requests splitting algorithms; max-flow min-cut algorithm and linear programming algorithm. 
Based on the splitting algorithm, embedding algorithm is proposed to assign virtual nodes and links into each 
substrate network provider simultaneously. In this work, the authors formulated the overlay network 
embedding process to be solved by means of mixed integer program. The aim of the proposed embedding 
algorithm is to increase the request acceptance rate and decrease the leasing cost of infrastructure. 

Pandey et al. [38] focused on determining the optimum network deployment strategy for VoD 
services in a heterogeneous networking environment. The deployment decisions can be made considering the 
following parameters: the required number of VoD servers, the minimum physical distance of server from the 
community and the bandwidth capacity requirement. The authors’ aim is to determine the minimum capacity 
requirements that maintain the minimum requirement of Quality of Experience (QoE) to be met. For that, 
they estimated three QoE metrics: the server waiting time, the minimum one-way delay, and the access 
network bandwidth consumption. Such estimation is done based on the analysis of user requirements and 
network configurations. Afterwards, they formulated the server location problem and path selection problem 
jointly as an Integer Linear Programming (ILP) problem. The authors solved this problem by means of ILP 
techniques, as well as, heuristics. 

One of the flaws in most proposed works to solve the resource allocation and/or replica placement 
problem is that they consider the problems of resource allocation or replica placement problem 
independently. Solving these two problems independently produces suboptimal solutions; this is due to the 
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direct effect of replicas locations on the amount of required resources and vice versa [11], [39]. The main 
advantage of combining replica placement and resource allocation in one joint problem is that the resources 
will be utilized efficiently and as necessary. Once these resources become no longer required, they will be 
released to be available for other users who require it or able to pay for it. For that, integrating the replica 
placement problem with the resource allocation is suggested firstly by [11]. Laoutaris et al. [11] argue that 
the replica placement problem and the resource allocation problem should be solved jointly to avoid a 
suboptimal solution. They proposed a realistic framework to build optimal resource/ content allocation model 
for a hierarchical architecture that minimizes the total cost. The cost here is represented by the distance 
between the storage location and the end user. The framework of Lautaris took into consideration the load 
balance and request peering among servers as constraints. In this framework, Genetic Algorithm is exploited 
to solve the resource allocation problem. Nguyen et al. [40] addressed the problem of overly network 
provisioning for multimedia contents to build a cost-effective virtual delivery network. Their framework aims 
to build an overly delivery network over a substrate network. They formulated the provisioning problem as a 
combination between content replication and resource allocation. Next, they solved this provisioning problem 
using a Lagrangian relaxation heuristic algorithm. 

Similarly, Nakaniwa et al. [41] introduced a mathematical model to join the content replication with 
server placement for the hierarchical architecture. Their aim was to maximize the system reliability of 
delivery network. They introduced an integer programming technique to maximize the reliability of the 
system. This technique finds the optimal scheme for both content and resource allocation problems that 
maximize the reliability of all the paths from each user to each file location. They applied their model on a 
real delivery network for internet service provider in Japan called "B Bit-Japan" with subject to delay and 
storage and/or transmission cost constraints. 

On the other side, Aldana Diaz and Huh [42] argue that allocating the IPTV resources including 
storage and bandwidth requires creating a cost function based on contents management and distribution. 
They proposed three cost-based policies, particularly, local hosting, external hosting, and partial hosting 
policies for 3rd party content service on a distributed architecture. In their detailed cost function, they 
considered the storage and bandwidth requirements that maintain the least level of Service Level Agreement 
(SLA). The results of them showed that partial hosting of contents is the most suitable technique wherein 
some of the contents are hosted locally while the others can be requested upon the demand. Such partial 
hosting may maintain the trade-off between the storage cost and bandwidth cost. However, this work lacks 
for real constraints on which contents should be hosted, and, which contents should be requested upon 
demand. Li and Wu [12] proposed a heuristic algorithm that finds the optimal number of popular contents 
and the optimal number of servers to store these contents. Their model ignores the contents' replication and 
the load balancing. 

Replicating the contents according to their status helps the service provider to decide on the required 
resources to allocate. Thus, considering the content status is an important point to achieve optimal resource 
allocation scheme depending on the replica placement. Therefore, in this paper, the content-aware resource 
allocation model is proposed aiming at investigating the impact of content status on resource allocation. 


3. CARAM FRAMEWOK 

In traditional delivery networks, as CDN, the service provider has to allocate the contents according 
to long-term service contract with a certain commitment. This pricing scheme hinders the service provider to 
cope with the sudden workload at the peak busy times. On the other side, this scheme leads to waste the 
resources due to the low utilization at the non- peak busy times. For that, pay-as-you-go scheme is suggested 
by the delivery network providers. Amazon CloudFront is a global example for pay-as-you-go scheme; in 
which, the service provider has to pay for the used resources only without any upfront, commit, or service 
contract. Particularly, the service provider has to control the resources allocation according to the demand to 
minimize the hosting cost [10] without affecting the user satisfaction. 

Therefore, the aim of Content-Aware Resource Allocation Model (CARAM) is to decide on the 
resources amount and the content replication scheme according to the demand so as to achieve a minimum 
cost subject to given constraints. The cost, here, is expressed in terms of the amount of payment to the 
network provider to host the contents. Thus, allocating the storage and bandwidth resources can be performed 
in conjunction with replica placement. The main advantage of combining replica placement and resource 
allocation in one joint problem is that the resources will be utilized efficiently and as necessary [10]. Once 
these resources become no longer required, they will be released to be available for other users who require it 
or able to pay for it. 

Content-Aware Resource Allocation problem focuses on determining the server locations in each 
service area, along with, the processing power, storage space, and bandwidth capacity at each location. 
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To achieve an optimal solution for this problem, the service provider has to obtain the size and popularity 
distributions of contents in each service area. Consequently, the replication degree, the popularity 
distribution, and the expected load of contents along with their requirement in terms of processing power, 
storage, and bandwidth must be estimated clearly. Replicating the contents according to their requirement 
helps the service provider to decide the required resources to allocate. Thus, this interdependency emphasizes 
the need to address resources allocation and replica placement jointly. 

Content-aware Resource Allocation as an optimization process should take into account the 
available resources and their cost, as well as, the contents and their characteristics as follows. 

a. Available resources: In the context of shared hosting environment, each resource provider advertises “for 
hire” resources with information of resource type, location, and cost. The processing power, storage 
space, and bandwidth capacity of servers are also provided. In such environment, the service provider can 
choose different resources, which belong to different resources providers. 

b. Contents requirement: each video in IPTV systems requires a storage space, processing power, and 
bandwidth capacity proportional to its popularity. Thus, replicating the contents according to their 
popularity distribution gives the optimal replication scheme, which in turn optimizes the resource 
allocation process. 

The information about the available resources, as well as, the requirements is gathered from the 
resource providers and service providers respectively. The service provider is responsible for doing the 
allocation process based on the requirements. Upon completion the allocation process, the service provider 
sends the process results to the resource provider to create the topology. In case of the existence of resource 
broker, the resource broker can complete the whole process according to the given information from both 
parties. The flow of Content-Aware Resource Allocation Model (CARAM) is depicted in Figure 1. 


Service Provider Domain Replication Patter 


Estimated Content 
Demand 


Resources Replica 
dimensioni Placement 


Network 
topology 


VDN Topology 


Available Resources Network Provider Domain 


Figure 1. CARAM framework 


As depicted in Figure 1, in CARAM model, the information on both the content popularity 
distribution and the demand of each video are gathered from the service provider. This information on the 
resources from each physical network is provided by the resources providers. Then, the desired optimization 
tool is employed to compute the optimal topology including the replication pattern. Lastly, the resultant 
topology is passed to the resources providers or the resource broker to create the delivery network. 


4. CONTENT-AWARE RESOURCE ALLOCATION PROBLEM FORMULATION 

Content-aware Resource Allocation problem over the peer-service area architecture can be modeled 
as a set of inter-connected service areas. Each service area contains a set of servers, which can be physical 
servers or virtual servers. Each server is expressed by its processing power, storage space, and bandwidth 
capacity. 
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This paper considers a delivery network comprising of A interconnected service areas with 
PL, potential locations in each service area. Each potential location is with known storage capacity mr ; 


processing power uj", bandwidth bj"“*, and Load capacity Lj***. Moreover, each potential location 


advertises the cost of storage unit $}, bandwidth unit fj, and CPU consumption unit 6;. Let there be a set of 
Video contents N. The size of each object is denoted by m; and measured in Megabytes (MB). Each object i 
is associated with its processing power u;, bandwidth b;, expected load L;, and popularity value p; € [0,1]. 
Each object has a possibility to be replicated among the service area or requested from another service area 
based on its popularity and the trade-off between the hosting cost and redirection cost. In the cost estimation 
stage of Content-Aware Resource Allocation Model (CARAM), it is necessary to distinguish between two 
terms server initiation cost (C,) and replication pattern cost (Cp). 


4.1. Server initiation cost (C,) 

In CARAM, the service provider has the ability to lease the resources from different physical 
network providers. Each physical network provider has own pricing plan for leasing the resources. The 
pricing plan includes the cost of configuring the server at the first time, deploying software, and maintaining 
the servers including the backup and security [40]. For instance, in the virtual networks, the leasing cost CL 
for the virtual network is the cost of initiating and deploying the candidate locations, which are selected as a 
virtual server in all service areas, as in (1). The cost C;represents the leasing cost of a server j in the service 
area a. 


Pla 


CG, = Dhar Èj-1 Yaj * Cj (1) 


Furthermore, yj is a binary indicator to indicate whether the potential location is selected yj, = 1 or 
not Yja = 1. To facilitate the communication among the service areas, the service areas are connected via 
inter-service areas links. These links enable the dispatcher in any service area to redirect the requests 
targeting the unavailable contents to other service areas. The cost of these links can be obtained from the 
redirection process in the replication pattern as will be explained in the next sub-section. 


4.2. Replication pattern cost (Cp) 

The contents replication represents a challenge in CARAM. In this step, the content replication cost 
must be envisaged. The replica placement problem should be extended to include the cost of other resources. 
The resources cost includes the cost of storage, processing power, and bandwidth for hosting the popular 
contents. Besides, the cost of contents redirection is included. Redirection cost determines the cost of 
requesting unpopular content from other service areas in case of not replicating it due to the resources 
limitations. The following equation explains the replication pattern cost based on the above illustration. 


mi 


CBR; 


Cr = Daca Ljertg Lien Xija(mS; + uij + b:B;) + Xaca Djertg Lien Pi(1 — Xija) Ba (2) 


The first part of (2) depicts the hosting cost of contents inside the service area including the storage 
cost m;Sj, the processing power consumption cost cj6;, and the bandwidth cost b;ßj. On the other hand, the 
second part shows the transmission cost of the streams that have to be redirected for non-existent contents 
inside the service area. To estimate the transmission cost, the number of streams is calculated firstly by 
dividing the size of redirected object m; by the bit rate CBR. Next, the bandwidth unit cost of the service area 
Pa is provided and multiplied by the number of streams for redirected objects to calculate the redirection 
cost. The inclusion of popularity value p; of redirected object in the second part of this equation is important 
to control which contents should be redirected. Only the low popular contents can be redirected due to their 
low redirection cost. 

Upon completion of CARAM process, the optimal topology can be identified through the following 
components: 

1. Servers and Resources— A server is defined through its location, the allocated computation power, storage 
space and bandwidth. The server’s bandwidth is allocated for delivering the contents to the end-users. 
Thus, it is dimensioned based on the aggregated amount of requests targeting the contents stored in this 
server. 

2. Replication Pattern- The contents are replicated according to their popularity and expected load such as 
the popular contents can be replicated in the service area while, the low popular contents can be 
redirected to the other service area. For that, the replication pattern determines the replicated contents, the 
replication degree, and the redirected contents, as well. 
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4.3. Decision variables 
In fact, there are two binary decision variables are introduced to formulate the content-aware 


resource allocation problem. The first binary variable Y; a is employed to decide which potential location 
must be belonged within the optimal topology while, the second one X; ja to govern the replication pattern. 


a; = {i if site j in service area a is selected as a server (3) 
es NO) otherwise 

PES [i if content i is replicated into a server j inservice area a (4) 
ne 0 otherwise 


In fact, the two binary decision variables are related to each other and affected by each other. 
For instance, these two variables determine the potential locations that should be selected according to the 
minimum cost of the replication scheme, at the same time; they decide whether the contents can be replicated 
in a potential location or not. Such relation supports the argument that the network topology, resources 
dimensioning, and content replication should be conducted mutually due to the mutual interaction between 
them. 


4.4. The proposed optimization model 
The resources allocation problem as a constrained binary optimization model can be formulated 
mathematically as explained in (5-11). 


Minimize: 

Cr = C+ Cp (5) 
Subject to 

Lien MXija S mrt vj €PLg,VaEA (6) 
Pien UiXija S Uj” vj Ee PLaYaEA (7) 
Dien biXija < bj” Vie PLa,WaeA (8) 
Pien LiXija SL Vi E Ply, VaEA (9) 
È jePLa Xija S P ViEN,VaEA (10) 
È jePLa Xija = Pi * P ViEN,VaEA (11) 
Xija E [0,1] WiEN,Vj/EPL,,VaEA (12) 
Yja € [0,1] vje PLaYaEA (13) 
Xija =0 Vi EN not allocated in j inside a (14) 
Xija = Yja foralli,j Ea (15) 
Liat Via = P (16) 


In this model, the cost function is illustrated in (5) as the sum of server initiation cost and replication 
cost. Constraints (6—9) enforce the capacity of servers in terms of storage, processing power, bandwidth, and 
workload, respectively. These constraints (6—9) state that the total storage space, processing power, 
bandwidth, and workload of all contents replicated at server j should not exceed this server capacity. 
The number of copies for content i inside a service area has to be less than or equal number of selected 
potential servers P in that service area as depicted in equation (10). In addition, the number of copies for 
content i should be bounded by its popularity value P; as depicted in equation (11). The integrality and non- 
negativity constraints are presented in equations (12-14). The contents should not be replicated on a potential 
server unless this potential location is selected as a server according to equation (15). In other words, the 
contents will not replicate on a site j unless this site is selected as a server. Finally, the number of servers that 
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should be selected in any service area should be bounded by a predefined number P as in (16) where P is a 
predefined number of locations in the service area. 


4.5. The complexity of the proposed optimization model 

The proposed optimization model for Content-Aware Resource Allocation problem is a 
combinatorial optimization problem and belongs to NP-complete class. We can prove the completeness of 
this problem by considering the whole network consists of a single service area with a single server, which 
allocates all the objects. Moreover, we assume that the resources cost, except the storage cost, equals zero. 
According to these assumptions, the proposed optimization problem will be written mathematically as. 


Minimize: 
Cr = Dien MS; (17) 
Subject to 
Pien MiXija SM Vi Ee PLaẹYaEA (18) 


The simplified form of this problem is similar to the well-known knapsack problem, which is an 
NP-complete problem [40]. For clarity, the knapsack problem tries to minimize/maximize a certain quantity 
with respect to some constraints. For instance, the knapsack problem aims at maximizing the obtained profit 
or minimizing the cost without exceeding the knapsack capacity. Similarly, CARA problem aims to minimize 
the allocation cost with respect to some enforced constraints. 


5. HYBRID GENETIC ALGORITHM FOR SOLVING CARA PROBLEM (HGA_CARA) 

There is no algorithm with a polynomial time complexity can solve the NP-complete combinatorial 
optimization problems. On the other side, applying the evolutionary algorithms is a good choice to solve such 
problems. Thus, we integrate the Genetic Algorithm with two heuristic repair algorithms to solve CARA 
problem. Such integration reduces the search space and improves the ability of the proposed model to 
produce optimal solutions efficiently. 


5.1. Heuristic repair algorithms 

The problem of producing infeasible chromosome can be solved by incorporating problem-specific 
knowledge. For CARA problem, we have proposed two heuristic repair methods. The first one, called 
Restricted Search Operator (RSO), is to choose a certain number of servers in each service area based on the 
constraint 16. The second one, called Heuristic Replica Repair (HRR) algorithm is proposed to control the 
replica placement constraints. 


5.2. Restricted search operator (RSO) 

In many combinatorial optimization problems, there is a need to deal with chromosomes that hold a 
specified number of ones; for example the problem of selecting a subset of candidate locations to find the 
optimal solution, as in CARA problem. In this case, the individuals, which have more or less than the 
specified number of ones, will be rejected. Thus, Restricted Search Operator (RSO) is proposed to deal with 
this constraint. According to [43], RSO was firstly proposed by Salcedo-Sanz in 2004; next it is considered 
an extra operator added to the conventional GA. The idea of RSO is that the individual generated from 
crossover and mutation operators x may hold p ones. These ones may be different from the desired 
number M, thus, if p < M the RSO adds M — P ones in randomly chosen positions. On the other side, 
ifp > M the RSO selects P — M ones randomly to be removed from the binary vector. The following 
Pseudo-code describes the Restricted Search Operator. 


Let M be the specified number of 1s in an individuals. 
For every generation of GA 
For every individual in the population 
Check the number of Is (p) 
If (p<M) add_ones(M-p) 
Else Remove_ones(p-M) 
end(Gif) 
end(individual) 
end(generation) 


Figure 2. Restricted search operator [42] 
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In CARA problem, the restricted search algorithm is adapted to choose a certain number of servers 
where each server is represented as a string of binary values instead of a set of genes. In other words, the 
restricted search operator in the resource allocation case will work on a set of bulks instead of a set of genes; 
each bulk in this case consists of a set of sequential genes. As will be explained through this paper, each 
potential location j in the service area a consists of a set of sequential genes starting from the gene number 
((a—1)*PLa+(j—1))*N +1 and ending with the gene number ((a — 1) * PLa + (j — 1))* N +N. 
The restricted search operator represents the constraint 16 as follows: depending on the predefined number of 
locations P, the restricted search operator will choose randomly P bulks in each service area to be 
unchangeable. After that, the rest bulks in the service area will be change to zero. Thus, if the value of all 
cells for any location equal zero, this location will not be selected to be server. After applying the restricted 
search operator (RSO), which determines the number of selected servers, HRR algorithm checks each 
chromosome by counting the number of replicas for each video in the selected servers that are produced from 
RSO process. 


5.3. Hybrid genetic algorithm for CARA problem (HGA_CARA) 

To solve CARA problem, the genetic algorithm is hybridized with two repair algorithms. In this 
hybrid model, the objective function is to minimize the hosting cost (i.e. equation 5) with respect to the listed 
constraints. 

In CARA problem, there are a set of candidate locations to construct servers in each service area. 
The aim is to select a subset of these locations to replicate the contents over them based on predefined 
criteria. For this problem, the binary encoding for the chromosomes is exploited. In such binary encoding, the 
chromosome is represented as a string of ones and zeros. This binary string represents the candidate 
solutions, in which each cell indicates for the replica position in the service area. The total length of the 
chromosome is A * PL, * N where A represents the number of service areas, PL, indicates to the number of 
candidate locations in the service area, and N denotes the number of contents. 

For instance, if the delivery network comprising of 2 service areas with two potential locations for 
each and the number of contents equals to 5, then the total length of chromosome equals to 20 genes, as 
depicted in Figure 3. Each cell in the chromosome holds a binary value Xija € {0,1} that indicates whether or 
not an object i is replicated at the server j inside the service area a. The position of any gene can be identified 
according to the following (20). 


gene(Xija) = ((a —1)*PL,+QG- 1)) *N+i (20) 


Figure 3. Chromosome representation for CARA problem 


For example, the gene 13 holds a replica for the 3rd content into the first server in the second service 
area a according to Figure 3. We can find this gene using equation (20). gene(x312) = ((2 —1)*2+ 
(1 - 1)) * 5+3 = 10 +3 = 13. The potential location j in the service area a holds the genes from 
((a—1)*PLa+(j—1))*N+1 to((a—1)*PLa+(j—1))*N+N. For instance, if there are 20 
contents to be replicated over a delivery network with 5 service areas, each area contains 4 servers. The total 
length of the CARA problem chromosome will be 4*5*20 = 400 genes. The third server in the second service 
area will be represented by the genes from the position 121 to the position 140. 

The initial population of this problem is generated randomly by selecting a random binary number 
for each gene in the chromosome. During the evolution process, the roulette wheel selection technique is 
implemented to select two chromosomes at each time. The uniform crossover operator with a standard 
crossover probability is applied on the selected pair. After that, the reproduced offspring are mutated, with an 
extremely small mutation probability, to ensure the diversity in the search space. During the reproduction 
process, the new offspring may lie in the region of unfeasible solutions; therefore, two heuristic repair 
algorithms are used to repair this offspring to be obeyed by the proposed constraints as depicted in Figure 4. 


Content-aware resource allocation model for IPTV delivery networks (Suliman M. Fati) 


378 o ISSN: 2088-8708 


Initialize 
population 


HRR alg. 
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Figure 4. Hybrid GA for CARA problem (HGA_CARA) 


6. THE EXPERIMENTAL RESULTS 

To evaluate the performance of the proposed content-aware resource allocation model (CARAM), 
an experiment using the proposed evolutionary approach HGA_CARA is conducted. Here, the numerical 
results are reported. The experiments have been conducted on a system model of 40 servers divided among 
10 service areas and 100 video contents. The predefined number of servers P equals 2. This means that the 
model has to choose two potential locations among four locations to become servers. These potential 
locations are associated with the cost of storage, processing power, and bandwidth resources. Moreover, the 
video contents have to be replicated on the selected servers in each service area during the experiment. In this 
section, the effect of content status on the resource allocation process is investigated. The effect of content 
status is studied using two popularity distributions. 

The aim of using two popularity distributions in this experiment is to study the impact of the change 
in the popularity distributions on resource allocation process. This impact is measured in terms of storage, 
CPU consumption, bandwidth, and the number of popular contents. The popularity distributions have been 
sampled using two different equations. The first popularity distribution follows the normal density 
function [10]. This popularity distribution has been sampled for 100 contents and depicted in equation (21). 
We refer to this distribution by the symbol (DF). 


e-((i-1)4)2/2 
Pi = Sa ee (21) 

The other popularity distribution has been sampled for the same content size according to Zipf’s law 
distribution. Like the symbol k in Zipf distribution, the symbols A represents for the degree of skewness in 
the popularity distribution (DF). The expected load of contents is estimated using the proposed IPTV content 
status model for the two popularity distributions. 

Figure 5 depicts the two popularity distributions with different skewness values 0.01, 0.04, and 0.08. 
According to this figure, the curve of small skewness values (i.e. 0.01 and 0.04) indicates that the contents 
popularity are distributed in an uniform manner, which means that all contents have the same popularity 
value. However, the curve tends to be more skewed when the skewness value increased. Notably, DF 
distribution tends to be more skewed than Zipf’s distribution in the figure when the skewness value increased 
(equals 0.08 as in the figure). Consequently, the number of popular contents in DF distribution is less than 
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the number of popular contents in Zipf’s distribution. For instance, Figure 4 shows that 36% and 66% of 
contents are popular for DF and Zipf’s distributions, respectively. The same thing can be observed in Figures 
6 and 7 where Figure 6 shows that only 6% of the contents are popular at the skewness value equals 1 for DF 
distribution, whereas the other contents having the popularity value zero. 


—— df0.01 —— df0.04 
—— df0.1 —— df0.4 
——<dfl 


Figure 1. DF popularity distribution with different skewness values 


On the other side, at the same skewness value, Figure 7. shows that more than 16% of contents are 
popular with a long tail of the curve reaches 70% of contents for Zipf’s law distribution. The popularity value 
of these contents is low, but it does not equal zero. That means that Zipf’s distribution replicates more 
contents than DF distribution. 

Therefore, it can be concluded that at the higher skewness values, a small percentage of contents are 
popular. However, in DF distribution, this percentage is less compared to Zipf’s distribution. This behavior 
can be interpreted as follows. In the DF distribution, the popularity degree divided among the contents in a 
smooth manner wherein the contents popularity values are close to each other at the low skewness degree. 
However, when the skewness value tends to increase, the disparity in popularity degree of contents becomes 
clearer. Unlike DF, the popularity curve starts with popular contents and then cascaded down gradually. 
On the other side, the DF distribution looks similar to Zipf’s distribution at the lower skewness degree. 
However unlike the Zipf’s distribution, the DF popularity curve lacks the smooth gradient at higher skewness 
degree. In general, the popularity degree of popular contents is higher in DF distribution than of those in 
Zipf’s distribution at the same skewness values. 
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Figure 7. Zipf’s popularity distributions with different skewness values 
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Based on this observation, the load of the popular contents in DF distribution is higher than the load 
of the same popular contents in Zipfs distribution at the same degree. This behavior of load distribution is 
depicted in the Figures 8. and 9. where the load of the first content is 800 and 5000 for Zipf and DF 
distributions, respectively. As mentioned through this thesis, the popularity distribution is not the only factor 
that affects the anticipated load distribution. However, it is the most influential factor. Thus, the anticipated 
load distribution of the contents takes the same behavior of the popularity distribution, but with some 
deviations. Figure 8. depicts the load distribution for DF distribution at low skewness values. This figure 
depicts that the popular contents take larger workload portion than the other contents. This is due to the 
highly skew of DF distribution as pointed out in Figure 6. 
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Figure 8. Expected load according to DF distribution 


Accordingly, most of the incoming workload will be redirected to a small portion of contents. 
Unlike Zipf’s law, the load distribution following DF distribution will be more skewed than Zipf’s 
distribution. Such distribution follows the popularity skewness. For instance, a small percentage of contents 
utilize a higher portion of the load where these popular contents take a higher popularity degree. On the other 
side, the other contents having very low popularity degrees take the rest portion of the load, which is very 
low or in many cases equals zero. On the other hand, the expected load of contents that follow Zipf’s law 
distribution will be distributed more smoothly than DF distribution as depicted in Figure 9. However, 
definitely the popular contents will take the lion’s share. 
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Figure 9. Expected load according to Zipf’s distribution 


Based on the different skew behaviors of these two popularity distributions, the effect of content 
status on the aspect of resources allocation process with accordance to these two popularity distribution is 
studied. The effect of content status on resource allocation process is investigated and reported below in 
terms storage, bandwidth, and the number of popular contents. 

The expected load at different skewness values in the service area is depicted in Figure 10. In this 
figure, the expected load of DF distribution is higher than the expected load of Zipf distribution. This is due 
to that the expected load of contents in DF distribution is higher than the contents in Zipf distribution. 
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Figure 10. The total load for one service area 


What is apparent from the same figure is that in case of the expected load of contents following Zipf 
distribution, the expected load increases linearly when the skewness value increases. On the other side, the 
expected load of contents following DF distribution follows the same behavior at the low skewness values. 
However, at the higher skewness values, the expected load alters its behavior inversely as it decreases when 
the skewness value increases. The behavior of DF can be interpreted by the sharp disparity in the popularity 
degree at higher skewness values. Such sharp disparity leads to store very small percentage of contents with 
high popularity values. Moreover, in case of redirecting any content with lower popularity, a significant 
portion of the workload can be redirected. Such redirection leads to decrease the load on the servers. On the 
other side, the contents with low popularity values may be redirected to other service areas. Redirecting these 
contents leads to reduce the load on servers. Such reduction in load interprets the dwindling in the total load 
of DF distribution at higher skewness values. 

Furthermore, the required bandwidth of any content in the service area is a function of both the 
active streams at the server and the bit rate of the streamed contents. Thus, the expected bandwidth for the 
delivery network follows the workload distribution if we assume that the bit rate of all contents is fixed. 
The expected bandwidth for one service area for bit rate=9.8 Mbps is depicted in Figure 11. 

Figure 12 plots the exploited storage versus different skewness degrees for both popularity 
distributions. The figure shows that the exploited storage space tends to decrease when the skewness degree 
increases. However, the storage space of contents following DF distribution decreases rapidly more than the 
storage space of those content following Zipf distribution. As depicted in Figure 12, the storage space of 
contents following Zipf distribution decreases from 58 GB to 44 GB while the storage space of contents 
following DF distribution decreases from 82 GB to 22 GB for the skewness values (0.01—2). 
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Figure 11. The expected bandwidth for one service area 
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Figure 12. The exploited storage versus different skewness degrees 
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This behavior can be interpreted as follows. In case of DF distribution, the contents are close to each 
other in their popularity, which leads to replicate more contents in the service area. Moreover, the number of 
replica of each video is high because of the popularity values are closer to each other than Zipf’s distribution 
at the same low skewness values, which leads to increase the number of replica for the replicated contents, as 
depicted in Figure 13. Consequently, the exploited storage space becomes high. From the same Figure, the 
number of replicas for Zipf distribution-based replicated contents is less than that of DF distribution, which 
leads to exploit a less storage space for Zipf distribution contents. 

In contrast, the exploited storage space for both types is decreased at the higher skewness value 
(e.g. at skewness values equals 1 and 2) due to that only very small percentage of contents will be replicated 
in the service area, as depicted in Figure 14. However, DF exploits a storage space slightly less than Zipf 
distribution at higher skewness values. This is due to that DF replicated a percentage of contents slightly 
smaller than the percentage of contents replicated by Zipf’s distribution. 

Finally, the number of stored contents in the service area is measured and depicted in Figure 15. 
From this figure, it can be noticed that the number of stored contents in the service area decreases when the 
skewness value increases. However, the decreasing for DF distribution is larger than that for Zipf 
distribution. The interpretation for this phenomenon refers to the fact that the skewness in DF distribution is 
higher than the skewness of Zipf distribution. 
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Figure 15. Percentage of popular contents for both distributions 
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7. CONCLUSION 

Integrating the replica placement with resource allocation is a necessity to cope with the fluctuation 
of demand on IPTV services. In this paper, Content-Aware Resource Allocation Model (CARAM) is 
proposed. CARAM model considers the contents’ status to decide the required resources. For this purpose, 
the replica placement problem and the resource allocation problem have been integrated aiming to allocate 
the resources depending on the necessitated replication scheme. The content-aware resource allocation 
problem is studied and formulated as a constrained binary optimization problem. The proposed model has 
been solved using Hybrid Genetic Algorithm. The experiments have been conducted on two popularity 
distributions. The experimental results showed the impact of content status on the resource allocation 
process. Thus, considering the content’s status during the resource allocation process helps in allocating the 
resources according to the demand without wasting the resources. The future work for this direction will 
focus on building online optimization method for allocating the resources on the fly. Such online 
optimization helps in allocation the resources according to the real demand. 
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